VOLUME 63 


WHOLE No. 297 
1949 


NUMBER 2 


Psychological Monographs: 
General and Applied 


Combining the Applied Psychology Monographs and the Archives of Psychology 
with the Psychological Monographs 


HERBERT S. CONRAD, Editor 


A Comparative Study of the 
Wherry-Doolittle and a Multiple 
Cutting-Score Method 


By 
GLEN GRIMSLEY 


General Motors Institute 
Flint, Michigan 


Accepted for publication, August 1, 1948 


Price $0.75 


Published by 


THE AMERICAN PSYCHOLOGICAL ASSOCIATION 
1515 MASSACHUSETTS AVE., N.W., WASHINGTON 5, D.C. 


= 
= 


CopYRIGHT, 1950, BY THE 


AMERICAN PSYCHOLOGICAL ASSOCIATION 


PREFACE 


HIS STUDY was done in 1946 at The University of 

Southern California, where it was submitted, under 
the title: A Comparative Study of the Wherry-Doolittle 
and Multiple Cutting-Score Methods, in partial fulfill- 
ment of the requirement for the degree Doctor of Phi- 
losophy. 

The writer wishes to express his appreciation for the 
assistance of the. members of his committee: Floyd L. 
Ruch, Milton Metfessel, R.R.G. Watt, D. W. Lefever, and 
Melvin Vincent. 


GLEN GRIMSLEY 


= 
“4 
> 
3 = 
| 
| = 
t 4 


I 
] 
II 
IV 


TABLE OF CONTEN'LS 


I. THE PROBLEM AND DEFINITION OF TERMS USED 


A. The Problem 


1. Statement of the problem 


2. Importance of the study 


B. Definition of terms 


1. Multiple cutting-score method 


2. Wherry-Doolittle method 


3. The standard group 


4. The prediction group 


Il. REVIEW OF THE LITERATURE 


A. 


. Recent studies comparing the multiple cutting-score and multiple 
correlation method 


III. THe Supyects, MATERIALS, AND CRITERION UsEpD 


A. The subjects 


| 

\ 

Mi Multiple correlation methods of selecting and weighting tests in a -. 

B. Other weighting techhiquss (4 

C. The use of critical SOC 

5 

C. The criterion 

IV. METHOD AND RESULTS OF THE 4 


GLEN GRIMSLEY 


A. The Wherry-Doolittle Method 
B. The Multiple Cutting-Score Method 


C. Application of the critical scores and regression weights to the predic- 
tion group : 15 


D. Comparison of the battery selected by the multiple cutting-score 
method and the battery selected by the Wherry-Doolittle method .. 16 


V. A SuGGESTED MopiricaTION OF THE Cuttrinc-Score METHOD .... 19 


VI. SUMMARY AND DISCUSSION OF RESULTS 


Vi 
u 
t 
f 
d 
a 
i 
n 
b 
Cc 
tl 
h 
tl 
a 
tl 
k 
Ww 
Si 
il 
b 
J 
t] 
b 
il 
S¢ 


NE OF the most important practical 
O applications of psychological meas- 
urement consists, in the prediction of 
probable achievement (academic, voca- 
tional, or social) from scores derived 
from tests, interviews, and biographical 
data sheets. As it is seldom possible to 
achieve a high degree of predictive 
efficiency by using a single predictive 
instrument, methods have been de- 
veloped for the selection of batteries of 
measuring instruments and for the com- 
bination of several scores into a single 
prediction score. 

Of these techniques, the multiple 
correlation method generally has yielded 
the most highly predictive scores and 
has been most widely used. However, 
there are two disadvantages to the use 
of the method: it can be used only by 
a trained statistician, and it is a lengthy 
process. These disadvantages are par- 
ticularly noticeable in the field of per- 
sonnel selection, because of the need to 
keep costs down and because of the lack 
of statistical training among personnel 
workers. 

In attempts to meet the need for a 
simpler and less costly method of select- 
ing batteries and combining prediction 
variables, critical-score techniques have 
been developed by Thurstone (19), 
Johnson (11), Franzen (2), Ruch (16) 
and others.* 

The particular technique with which 
this study is concerned is one developed 
by Ruch (16). This method (referred to 
in this study as the multiple cutting- 
score method) requires only about one- 


*These contributions 


will be discussed in 
Chapter II. 


CHAPTER I 
THE PROBLEM AND DEFINITION OF TERMS USED 


third as much calculation time as the 
multiple correlation method and can be 
used by a clerk with no training in 
statistics. Studies by Ruch (17), Wahoske 
(21) and the writer (4) have shown the 
method to be approximately equal to 
the multiple correlation method in the 
selection of batteries of tests predictive 
of vocational success in some fields. 


A. THE PROBLEM 


1. Statement of the problem. It was 
the purpose of this study to compare 
the multiple cutting-score method and 
the Wherry-Doolittle multiple correla- 
tion method for the selection of a 
battery of tests to be used in the pre- 
diction of course grades in elementary 
accounting classes at The University of 
Southern California. Answers were 
sought to the following questions: (a) 
will the batteries selected by the two 
methods include the same tests? (b) 
which method will select the more pre- 
dictive battery? 

2. Importance of the study. The ulti- 
mate goals of any science are practical 
prediction and control. Therefore any 
improvement in prediction technique is 
of great importance. From the theoreti- 
cal point of view an improvement con- 
sists largely in an increase in predictive 
efficiency. However, from a_ practical 
point of view a methodological improve- 
ment may consist merely in a simplifi- 
cation of method. 

If it is found that the multiple cut- 
ting-score method yields batteries of 
tests which have greater predictive 
efficiency than those developed by the 
multiple correlation method, the find- 


= 
“ay Er 
: 
a 
1 


2 GLEN GRIMSLEY 


ings are of considerable theoretical im- 
portance; or if it is found that the 
multiple cutting-score method is ap- 
proximately equal in efficiency, the find- 
ing is of great practical significance, 
because of the fact that it requires little 
time and no statistical training on the 
part of the user. 


B. DEFINITION OF TERMS 


1. Multiple cutting-score method. This 
name is used to refer to a method de- 
veloped by Ruch (16). It consists essen- 
tially of the following steps: 

a. Record each individual's test scores 
and his criterion score ona 5” X 8” card. 

b. Separate the cards into two or more 
groups on the basis of criterion scores 
(e.g., into upper and lower halves, thirds, 
or fourths). The division may be made 
as fine as seems desirable, consistent with 
the total number of cases. However, the 
division should not be so fine as to 
make the means of criterion or test 
scores of the sub-groups too unreliable. 

c. Calculate the mean scores of all the 
separate tests for each of the groups 
established by step two. Round the 
means to the nearest whole number, al- 
ways rounding those ending in .5 to the 
next-higher whole number. The means 
thus established are the cutting-scores. 
The mean of the test scores of the upper 
criterion group (mean of upper half, 
upper quarter, or other highest criterion 


group) becomes the A standard, the 
mean of the test scores of the next- 
highest criterion group becomes the B 
standard, etc. 

d. Sort the cards on the basis of the 
scores On one test at a time and fill in 
Table I. (It should be noted that if a 
test score meets the A standard, it also 
meets all lower standards. Therefore, the 
mean and per cent values in Table I, 
and also in Table II, are cumulative 
from left to right. For example, the 
values for the mean and per cent 
columns under standard D are based on 
all cases with test scores at or above the 
D critical score.) 

e. On ordinary graph paper plot the 
criterion means against the per cent 
selected. Connect by straight lines the 
A, B,C, D... points for each test. (See 
Figure 1, page 12.) 

f. From this graph determine by in- 
spection the order of usefulness of the 
tests and number them in that order. 

g. Sort the cards into the groups meet- 
ing standard A, B, C, D, etc. on the basis 
of test number one, and discard all who 
fail the lowest standard. Record the re- 
sults in Table II below. 

Sort the cards not discarded above on 
the basis of scores on test number two, 
discarding those who fail and record the 
results in the table. Continue this proc- 
ess until cards have been sorted on the 
basis of all test scores. If, after adding 


TABLE I 


Mean criterion score and per cent selected by each standard or critical score 


Standard A 


Standard B 


Standard C Standard D 


Mean % Mean 


Mean 


/0 


| 
| 


Mean | 


t 
4 
| | | | | 


any test to the battery, it is found that 
the predictive efficiency of the battery 
has been reduced by the addition of that 
test, it should not be used. For example, 
if the addition of test 3 to the battery 
made up of tests 1 and 2 reduces the 
mean criterion, scores of the groups 
selected, test 3 should not be used. The 
cards or cases should be sorted back into 


A MULTIPLE CUTTING-SCORE METHOD 


TABLE II 


3 


method (23) is a modification of the 
longer Doolittle method. It yields the 
multiple correlation corrected for the 
chance error attendant upon the addi- 
tion of each test. When the chance error 
added by a test is greater than the in- 
crease in validity of the battery the 
multiple R begins to diminish. After the 
final battery has been selected, beta 


Test 


Mean criterion score and per cent selected by each standard or critical score 


Batteries 


Standard 


Standard B 


Standard C Standard D 


Mean % Mean 


% Mean % Mean % 


I 


1&2 


1,2,&3 


1, 2,3,&4 


2, 


the groups selected according to tests 1 
and 2, and test 4 should be tried next. 
The Test Battery column in Table II 
would then be modified accordingly. (It 
should be noted that the effect of step g 
is to classify each case according to his 
lowest test score. For example, an indi- 
vidual might meet the 4 standard on 
tests 1, 2, 3, and 4; however, if his score 
on test 5 were below the A standard that 
individual would fall in group B, C, or 
D according to the standard met at test 

h. Plot the criterion means against the 
percentages from Table II in the same 
manner as in step five. (See Figure 2, 
Chapter IV.) 

i. Determine the most useful battery 
by inspection. (See footnote on page 11 
for inspection procedure.) 

2. Wherry-Doolittle Method. This 


weights are computed for each test and 
converted into b weights for use in com- 
puting each individual’s predicted 
criterion-score. 

3. The Standard Group. The standard 
group consisted of 250 students of ele- 
mentary accounting. Their test scores 
and course grades were used in the de- 
velopment of the regression weights for 
the Wherry-Doolittle battery and the 
critical scores for the multiple cutting- 
score battery. 

4. The Prediction Group. This group 
consisted of 250 students of elementary 
accounting who were paired with the 
members of the standard group on the 
basis of criterion scores. Their grades 
were ‘“‘predicted’’ by means of the 
Wherry-Doolittle and the multiple cut- 
ting-score batteries which had been de- 
veloped on the standard group. 


4 

4 


CHAPTER Il 
REVIEW OF THE LITERATURE 


E PARTICULAR multiple cutting- 
sheath technique used in this study 
is of such recent development that there 
are few studies of its efficiency. However, 
a great deal has been written concern- 
ing the general statistical problems of 
prediction. The remainder of this chap- 
ter will be devoted mainly to a review of 
(a) those variations of the multiple corre- 
lation technique which have been de- 
veloped in an attempt to simplify the 
procedure; and (b) studies of the various 
uses of cutting-scores. 


A. MULTIPLE CORRELATION METHODS OF SELECTING 
AND WEIGHTING TESTS IN A BATTERY 


Many schemas have been developed for use 
in the calculation of multiple R and the solu- 
tion of the multiple regression equation (7). 
One of the most widely used is the Wherry 
modification of the Doolittle method (23, 18) 
which was described in Chapter I. In spite of 
simplification it remains a laborious process and 
requires considerable statistical knowledge on 
the part of the user. 

In order to permit a simple equation for cal- 
culation of exact predicted criterion-scores, all 
correlation methods assume linearity of regres- 
sion and homoscedascity. This assumption is 
probably justified in the majority of cases, but 
the possible lack of normality and linearity is 
disturbing in the application of a method 
which assumes them. Some practical difficulties 
in the use of multiple regression weights are 
stated by Guilford (6): 


Multiple-regression weights may not always 
be the best ones for the parts of a battery. 
Simply summing scores without weighting ob- 
tained scores sometimes yields as good results as 
with the use of b coefficients. Certainly most b 
coefficients are awkward to use in practice, be- 
cause they are not simple integers. If weighting 
is to be based upon this principle, small integral 
weight values will serve just as well. 


Guilford then suggests, by way of example, that 
if b coefficients of .65, 1.12, and 2.35 are ob- 
tained, they might be rounded to 1, 2, and 3 
respectively, and that the effect of such rounding 
would probably be trivial (6.) 

If such rounding of weights is permissible, it 


is hard to see the practical usefulness of the 
lengthy process of obtaining b coefficients. These 
questions concerning the value of mathematical 
refinement of method have led numerous writers 
to propose methods of approximating regression 
weights, 

In 1938 Wherry (24) published two formulas 
for estimating beta weights. He states that the 
purpose of the method is to “(1) decide whether 
the exact values are worth obtaining, (2) obtain 
immediate approximation answers which the 
investigator might otherwise have to wait hours 
or days to obtain, and (3) select certain variables 
from a larger group which may then be given 
an exact solution.” 

In 1940 Flanagan (1) proposed a successive 
approximation technique for use when the 
number of variables is large; however, it appears 
to involve too much calculation to justify the 
loss in predictive efficiency. Jackson (9) in 1943 
criticised Flanagan’s method and published a 
series of formulas for obtaining approximate 
regression weights. While it is true that his 
methods reduce the amount of calculation in- 
volved in obtaining weights, here, too, there was 
a considerable loss in predictive accuracy. 
Furthermore, Jackson offered no evidence that 
such predictive accuracy as he was able to show 
on one set of data would hold up in all situa- 
tions. 

Jenkins (10) has published (1946) a quick 
graphic method for approximating r, R and 
partial r’s. The approximations are very close 
with a large sample, but with twenty small 
samples of fifty cases one of his graphically 
derived r’s deviated .15 from the calculated r. 
If such deviations are considered as errors, the 
chance of compounding them in calculation of 
R is very great. However, it must not be as- 
sumed that a failure to arrive at the same 
value for the relationships as by the product- 
moment method is evidence for the superiority 
of the product-moment approach. Until a com- 
parative study has been made of the predictive 
efficiency of batteries selected by the two 
methods, the value of Jenkins’ graphic method 
is not clearly established or gainsaid. 


B. OTHER WEIGHTING TECHNIQUES 


Richardson (15) states (1941) that the in- 
discriminate adoption of the multiple correla- 
tion approach has obscured the general problem 
of combining measures. He discusses several 
weighting methods which do not depend upon 
correlation. It is shown that the practice of 
weighting test scores in terms of the reciprocals 


re) 
m 
fc 
tc 
tl 
fc 
tc 
al 
Vi 
is 
dd 
th 
m 
te 
tk 
di 
d 
m 
cl 
Cé 
su 
a 
al 
bi 
in 
a 
di 
ti 
le 
a 
tc 
al 
a’ 
bi 
a’ 
a 
a 
te 
th 
ce 
le 
it 
4 


A MULTIPLE CUTTING-SCORE METHOD 5 


of the standard deviations of their distributions 
may reduce the reliability of the composite 
score (15). All of the formulas which he presents 
for use in combining prediction variables are 
to be used only with variables that are positively 
correlated and linear in their relationships. 

Another purported simplification is the 
“L-method” of Toops (20). This method selects 
the most useful variables and sums them for a 
composite score. Derivatives of the general 
formula can be used to select tests for batteries, 
to select items for tests, or to obtain reliability 
and validity coefficients. Whatever the ad- 
vantages of the “L-method,” simplicity certainly 
is not one of them, for Hollerith cards and ma- 
chine calculation are required. Stead (18) states 
that the “L-method” is more lengthy than the 
Wherry-Doolittle multiple correlation technique. 

In summary it might be said that these 
methods that have been developed in an at- 
tempt to obtain approximate regression weights 
have proved disappointing in practice. Those 
that are easy to apply (e.g., weighting test 
scores in terms of the reciprocal of the standard 
deviation) result in considerable losses in pre- 
dictive efficiency, and those, like Toops’ 
“L-method,” which compare favorably with the 
multiple correlation method in predictive ac- 
curacy, have turned out to be more compli- 
cated than the method they were designed to 
supplant. 


C. THe Use oF CRITICAL SCORES 


The use of critical scores or cutting-scores on 
a single variable is not new. In educational ex- 


aminations a series of critical scores have long . 


been used in the grading’ of students. Likewise, 
in the field of psychological testing the idea of 
critical scores is inherent in the concept of pre- 
diction from measurement. 

In 1919 Thurstone (19) published a_predic- 
tion method based on critical scores. Upper and 
lower critical scores were obtained by plotting 
a scatter diagram of test scores against instruc- 
tors’ ratings. The upper critical score was one 
above which every student was rated above 
average and the lower critical score was one 
below which every student was rated below 
average. Some tests were found to discriminate 
at only one end and thus yielded only one 
critical score. He developed a rough method of 
combining the results from several tests by 
scoring each student in terms of the number of 
tests upon which he exceeded the critical score. 
The best battery was the one which eliminated 
the most failures without eliminating any suc- 
cessful students. Thurstone found that the 
multiple correlation was more difficult to calcu- 
late and was less useful for individual diagnosis 
in a group of 114 college freshmen. 


In the same year Pressey (14) published a 
similar study. He made simple frequency distri- 
butions of students’ test scores which he divided 
into centiles. Subsequently, as students failed in 
their school work, their scores in the distribution 
were marked. Thus by inspection it was possible 
to determine the usefulness of each test for pre- 
diction of student failures. 

Horst (8) gives (1941) a formula for obtaining 
the cutting-score which will exclude the largest 
percentage of failures and at the same time 
retain the largest percentage of successful work- 
ers. However, his rational approach based on the 
assumption of the normal curve makes his 
multiple cutting-score method of doubtful value 
in the. practical situation in which normal dis- 
tribution curves are seldom found. In 1943 
Franzen (2) published a description of a multi- 
ple chi technique which he found to be more 
useful than the multiple correlation method for 
the prediction of a categorical criterion. How- 
ever, the multiple chi technique appears to 
require at least as much calculation time as the 
traditional correlation method. 

Johnson (11) published (1944) an ingenious 
method of reducing the time required to select 
the best battery of tests, attributes, or best 
group of items. His method seems likely to 
prove most useful when prediction is to be made 
from a large number of attributes rather than 
from tests, for his system makes use of only 
one critical score for each variable. The method 
could be modified to yield a series of “job ability 
standards,” either by using a large number of 
variables and calculating the probability of 
success in terms of the most predictive of several 
batteries which an individual might succeed on, 
or by establishing a series of “cutting scores” on 
each test. However, either of the above modifi- 
cations would increase the calculation costs so 
greatly as to destroy its advantage over the 
multiple correlation approach. 


D. RECENT STUDIFS COMPARING THE MULTIPLE 
CUTTING-SCORE AND MULTIPLE CORRELATION 
METHOD 


In 1943 Ruch (16) published his multiple 
cutting-score method (described in Chapter 1). 
In the same year the writer (5) made use of the 
technique in establishing standards for the 
selection of trainee draftsmen. An unpublished 
comparison of the results obtained by multiple 
cutting-scores and multiple regression weights 
showed the two methods to be equal in pre- 
dictive efficiency (4). In 1945 a comparative study 
of the relative efficiency of the Wherry-Doolittle 
and the multiple cutting score methods was 
made by the National Defense Research Com- 
mittee under the direction of Ruch (17). 

The cutting-scores and regression weights were 


. 
" = 
> 
e 
on 
il 
n 
of 
ls 


6 GLEN GRIMSLEY 


calculated on one group of 410 and applied to a 
different but comparable group of 410. The 
men selected by A, B, and C cutting scores 
were compared with equal numbers of men 
selected by the Wherry-Doolittle method. The 
multiple cutting-score method was found to be 
superior to the correlational method (C.R.= 1.8) 
at the A standard (13.2 per cent selection), and 
slightly inferior at the B and C standards 
(22.9 per cent selection and 35.4 per cent selec- 
tion, respectively.) 

Wahoske (21) found (1946), in predicting 
rated job performance in the same population 
upon which the cutting-scores and regression 
weights were established, that the multiple 
cutting-score method was superior in predictive 
accuracy to the Wherry-Doolittle method at all 
levels of selection. The differences in favor of 
multiple cutting-scores were not individually 
significant, but appeared significant if inter- 
preted in light of the general formula for the 
probability of combined events. It was also 
found that the computation time for multiple 
cutting-scores was one-third the time required 


in using the Wherry-Doolittle method. 

Wahoske suggested that the superiority of the 
cutting-score approach probably depended upon 
its avoidance of two unjustified assumptions 
which the multiple regression method requires: 
namely, linearity and compensation. These as- 
sumptions are discussed below. 

It is common knowledge that regression lines 
encountered in personnel work are not always 
exactly linear. In these cases product-moment 
correlation coefficients are too small to represent 
the true predictive value of a test. Moreover, the 
assumption of compensation, that high ability 
in one factor can compensate for low ability in 
another, may not be valid in all predictive 
situations. Wahoske implies that the superiority 
of the multiple cutting-score method is due to 
the fact that the only assumption required is 
that the group upon which the cutting-scores 
are established is representative of the popula- 
tion to be predicted (21). A fairer statement 
would be that the multiple cutting-score method 
requires no assumption regarding linearity of 
regression and assumes non-compensation. 


| 
‘ 
‘ 
‘ 


CHAPTER III 
THE SUBJECTS, MATERIALS, AND CRITERION USED 


Tt DATA necessary for a comparative In scoring, all tests were corrected for 
study of multiple correlation and chance success by the standard formula, 
multiple cutting-score methods might 


Ww * 
have been derived from many different = 
kinds of measuring devices. The only N = 4; 
requirements were that the distributions tly, W= 


of measurements be continuous and that the number answered incorrectly, and N =the 
: number of alternatives offered. 

they show some degree of correlation 

with some continuous distribution of TABLE III 

criterion scores. Because these conditions THE DIsTRIBUTION OF CRITERION SCORES 

are most frequently found in prediction 

of performance from psychological test 

scores such a prediction situation was 98 I 

chosen for this study. 


A. THE SuBJEcTs 


Five hundred students of elementary 92 34 
accounting at The University of South- 
ern California served as subjects. Fifty- 9 
six per cent were freshmen, 25 per cent 87 23 
sophomores, 13 per cent juniors, and 6 ~ = 
per cent were seniors, graduates, and 84 17 
special students. Ninety-three per cent 
were male. Twenty-three per cent were 81 25 
majoring in accounting, and most of the = ot 
remaining 77 per cent were majoring in 78 20 
some other department of the school of uA = 
commerce. Eighty-six per cent were 75 1 
veterans, Their ages ranged from seven- os = 
teen to thirty-eight, with a median of 72 13 
twenty-four years. Most of them had had 
some business experience; and 55 per 69 4 
| cent reported more than twelve months 67 : 
of such experience. 
B. THE MATERIALS 

In order to keep the time spent in rs 
test administration at a minimum, seven 60 : 

five-minute tests were constructed. The 38 
tests were not named but were simply s . 
numbered from one to seven inclusive. a 2 


The items used in tests number two Total eee 
and three were developed by Lovell (12). 


/ 


= 
95 10 
04 19 
93 17 : 
= 
= 
: 


GLEN GRIMSLEY 


C. THE CRITERION 


The criterion of success in elementary 
accounting was the semester grade as- 
signed each student by the accounting 
department. No check of the reliability 
of the grades was made. For, while un- 
reliability in the criterion might reduce 
its correlation with the tests, it had no 
bearing upon the purpose of this study. 


The grades assigned were based upon 
the entire semester’s work, including a 
final examination. As shown in Table 
Ill, the distribution of criterion scores 
was negatively skewed, due to the drop- 
ping out of the poorer students who 
would have failed or received low marks 
had they continued. 


8 
4 
4 
| 
4 
q 


FTER the subjects had been tested and 
A the tests had been scored, a 
5” X 8” card was made for each of the 
500 subjects in order to facilitate sorting 
and computation. This card contained: 
subject’s name, instructor’s name, class 
section, subject’s criterion score (course 
grade), and the subject’s scores on the 
seven five-minute tests. 

The cards were then divided into two 
groups, a “standard group” and a “pre- 
diction group.” This was done upon the 
basis of criterion scores, and the division 
was made class-section by class-section, 
so that each of the seventeen sections 
would be equally well represented in 
each of the two groups of 250 each. Be- 
cause it was necessary to pair subjects on 
the basis of class section as well as 
criterion score, it was not possible to 
obtain two groups in which the indi- 
viduals were exactly paired on criterion 
scores. However, as can be seen in Table 
IV, the distributions of criterion scores 
for the prediction group and the stand- 
ard group are very nearly identical. 
Table V shows the means, standard devi- 
ations, and measures of skewness for the 
distributions of test scores for the 
standard and the prediction groups. 

After the cards had been divided into 
the two groups, those constituting the 
prediction group were filed away. A de- 
scription of the correlation coefficients, 
regression weights, and multiple cutting- 
scores calculated for the standard group 
is given in the sections which follow. 


A. THe WHERRY-DOOLITTLE METHOD 


1. Product-moment correlations of tests with 
the criterion and each other. The seven tests 
were correlated with the criterion. Fisher’s Chi 
Square test (Guilford 6, p. 236) was applied and 
no significant deviations from linearity were 


CHAPTER IV 
METHOD AND RESULTS OF THE STUDY 


TABLE IV 


FREQUENCY DISTRIBUTION OF CRITERION- 
ScorES OF STANDARD AND PREDICTION 


Groups 


Criterion 
Score 


Standard 


Group 


Prediction 


Group 


= 
COA~A100 


I 


I 


250 


83.7 


7.81 
— .46 
3 (Mean — Median) 


o 


HHH HN DO O 


98 I 3 
97 = 
95 5 5 
94 9 10 
93 9 8 
92 17 17 
gI 12 II 
go 16 16 a 
88 9 
87 II 12 
86 14 14 
85 10 10 
84 8 9 
83 II 10 
82 7 8 
81 13 
80 
A 79 
78 1 
: 77 
76 
75 
74 
73 
72 
7! 
7° 
69 
68 
67 
66 
3 65 
64 
63 I 
3 62 
61 
60 I 
59 I 
57 
36 
Total 250 4 
| M 83.6 
7.86 
9 


GLEN GRIMSLEY 


TABLE V 


MEANS, STANDARD DEVIATIONS, MEASURES OF SKEWNESS OF THE 


DISTRIBUTIONS OF TEST SCORES 


*Sk= 


** S.G.=Standard Group 
P.G. = Prediction Group 


found. In general the intercorrelations are quite 
low; test number 4 shows slight negative corre- 
lation with tests 5, 6, and 7. The validity 
coefficients and intercorrelations are shown in 
Table VI. 

Test number 2 showing the highest correla- 
tion with the criterion, r= .308, was selected as 
the first test, and the Wherry-Doolittle method 
was used to select the other tests to be included 
in the battery. 

2. The Multiple R. As each test was added to 
the battery, Wherry’s shrinkage formula was 
applied. Table VII shows the change in Multiple 
R and Multiple R? as this process was carried 
through. 

As shown in Table VII the Multiple R 
reached its maximum value with the addition 
of test 7 and declined in magnitude with the 
addition of tests 5 and 3. Consequently tests 
2, 4, 1, 6, and 7 were included in the final 
battery.’ 

3. The Regression Weights. Beta and b 
weights and the constant (K) were calculated. 
Their values are given in Table VIII. 


MEAN STANDARD DEVIATION SKEWNEssS* 

I 2 3.36 — .03 +.0o1 
2 10.96 10.89 3-49 — + .03 
3 8.64 8.44 3-01 3.03 — .08 +333 
4 14.85 15.52 6.38 6.33 +.19 +.15 
5 | 78.01 16.98 14.40 +.42 .t3 
6 13.59 13.60 8.69 8.45 —.10 —.17 
7 | 16.05 15.62 4-39 4.76 — .33 — .32 

3(M —Mdn) 


B. THe MULTIPLE CuTTiNG-ScOoRE METHOD 


1. The critical scores. It was decided that with 
250 cases in the standard group it should be 
possible to have six sets of critical scores, 
A, B, C, D, E, and F. 

In order to obtain these critical scores the 
cards for the standard group were sorted into 
order of criterion scores from highest to lowest. 
Ties were broken on the basis of final exami- 
nation grade. The top 12.5 per cent of the cards 
were then separated from the remainder and the 
means were calculated for each of the seven 
tests, to give the A critical scores. The same 
process was carried through in order to deter- 
mine the values of the B, C, D, E, and F 
critical scores (see table IX). Table X gives the 
means obtained. 


* The decision as to how many tests should be 
included in the Wherry-Doolittle battery was a 
difficult one. Obviously little or nothing was 
gained by including tests 6 and 7; however, the 
writer decided to follow mechanically the rule 
of adding tests as long as R increased. 


TABLE VI 
INTERCORRELATIONS OF TESTS AND CRITERION (C) 


Variable | Cc I | 2 | 3 4 5 6 7 
Cc . 308 128 .258 082 023 115 
I “ete .408 .273 .223 401 207 
2 -332 165 285 . 246 -153 
4 ye — .063 — .025 — .007 
5 -164 233 
6 } 410 
7 | 


= 


10 
TE 
| 
q 


TABLE VII 


THE INCREASE IN MULTIPLE R as Eacu 


Test Was ADDED 


A MULTIPLE CUTTING-SCORE METHOD 


Name of Test R R 
2 . 308 
4 .136 368 
I -158 +397 
6 -158 - 398 
7 -162 -403 
5 -161 -402 
3 .158 -398 
Taste VIII 


Tue TEst WEIGHTS AND THE CONSTANT, K 


Name of Test Beta Weights b Weights 
2 . 215882 525566 
4 180735 . 234218 
I .178368 -467067 
6 — . 101045 — .090584 
7 -093984 . 162027 
Constant (K) = 68. 289 
TaBLe IX 
DEFINITION OF CRITICAL SCORES 
Definition 
ore 
A Mean test scores of 12.5% having high- 
est criterion scores , 
B Mean test scores of 25% having highest 
criterion scores 
Cc Mean test scores of 50% having highest 
criterion scores 
D Mean test scores of total standard group 
E Mean test scores of 25% having lowest 
criterion scores 


Mean test scores of 12.5% having low- 
est criterion scores 


TABLE X 


From the means in Table X the critical scores 
.were established by rounding to the nearest 
whole number. Means ending in .5 were 
rounded to the next higher whole number. In 
the case of reversals the highest mean below 
the reversal was used as the critical score for 
that standard and all those above. The effect of 
so doing can be seen by comparing the means in 
Table X with the critical scores in Table XI. 

2. Classifying the standard group according to 
critical scores. Classifying the standard group 
on each test was a simple process of sorting the 
250 cards into seven groups, those passing each 
of the critical scores on a test and those failing 
all of them. Table XII (which corresponds to 
Table I in Chapter I) shows the results ob- 
tained. 

3. Determination of the relative usefulness of 
the tests graphically, The data included in 
Table XII are shown graphically in Figure 1. 
By inspection of this figure the order of value 
of the tests was established as 4, 1, 2, 7, 6, 5, 32 


* The inspection process for determining the 
order of value of the individual tests is not 
difficult; however, a few pointers may be useful. 
On a graph such as Figure 1, the amount of 
correlation between a test and the criterion is 
shown by the slope of the line connecting the 
A, B, C, D, E, and F points for that test. If 
the slope is upward to the right the correlation 
is negative, and if it is upward to the left the 
correlation is positive. The greater the range of 
criterion scores between the A and F points 
the higher the correlation of the test with the 
criterion, Vertical distance on the graph repre- 
sents the per cent of cases selected; consequently 
some consideration should be given to the 
spacing of the A, B, C points along the lines. 
The percentage selected by the 4 standard on 
any given tests should not be too small, proba- 
bly not less than go, if the test is to be used 
with others in a battery, because many indi- 
viduals who meet the A standard on one test 
will fail it on another. Consequently, if the A 
standard is too high on the individual tests 


MEAN TEst Scores or Stx Groups ESTABLISHED ON THE 


BASIS OF CRITERION SCORES 


Mean Values 


I 11.8 11.3 11.2 10.5 9.9 8.7 
2 12.3 11.7 11.5 10.9 9.48 8.7 
3 9.2 8.9 8.6 8.1 | 
4 18.1 17.5 16.3 14.9 12.8 11.8 
5 77-6 77-1 79.0 78.3 76.2 73-9 
6 15.2 14.47 14.6 13.6 12.1 10.7 
7 16.9 16.6 16.4 16.0 15.5 14.7 


11 
F 
a 
as 
le 
est Upper Upper Upper Total Lower Lower -_ . 
12.5% 25% 50% Group 25% 12.5% 
4 
4 


GLEN GRIMSLEY 


O 


O 


Ol, 
O 


iN 


O 
t 
O 
tis 


83 84 85 
Mean Criterion Score 


Ficure 1. Mean Criterion Score and Per Cent Selected by Each Test. (Tests identi- 
fied by numbers, critical scores by letters.) 


TABLE XI 
INTEGRAL VALUES OF CRITICAL SCORES 


Critical Scores or Standards 


12 
3 
Wi 
bi 
1. 
3 w 
to 
a 
WwW 
tT 
ol 
Nemec! 

Test A B | D | E F 
I 12 II II 10 9 
2 12 12 12 It 9 
3 9 9 9 9 | fr 
4 18 18 16 15 13 12 | th 
5 76 76 76 76 76 74 ; b 

I 12 It 

6 15 14 14 4 EE 

7 17 17 16 16 16 tS _ 


A MULTIPLE CUTTING-SCORE METHOD 


TABLE XII 
MEAN CRITERION SCORE (M) AND PERCENTAGE SELECTED AT Eacu CRITICAL-SCORE LEVEL 


Cc 


M 


% % 


85.63 
85.46 
84.17 
85.90 
84.27 
84.48 
84.09 


51 51 
43 . 56 
53 . 53 
40 50° 
51 51 
51 51 
62 i 62 


TABLE XIII 
MEAN CRITERION ScorEs (M) AND PERCENTAGES SELECTED BY TEST 4 ALONE AND BY 
TESTS 4 AND 1 AT Eacu CriticaL-ScoreE LEVEL 


Tests A B Cc 
in 


D E F 


Battery M M % M 


%| M |\%| M |%| M |% 


85.85 


85.63 | 51 
87.07 


87.16 | 20 


85.63 
87.03 


4 


51 85.63 | 51 


85.18 | 64 
24 87.00 | 27 


84.71 | 73 
85.68 | 44 


85.21 | 52 


4. Construction of trial batteries. After the 
estimated order of predictive merit of the tests 
had been established, the combination of tests 
into batteries was a simple card-sorting process. 
Test number 4, being the most predictive single 
test, was used as the basis of all trial batteries. 
Test number 1, the second most predictive test, 
was added to form battery number 1. This 
battery was constructed by sorting the cards into 
six groups on the basis of scores on tests 4 and 
1. The A group was made up of those cards on 
which the scores on tests 4 and 1 were equal 
to or higher than the A standard or A critical 
score for both tests. The B, C, D, E, and F 
groups were sorted in the same manner. The 
mean criterion scores and the per cents selected 
were then calculated for each of the six groups. 
Table XIII shows the results (for the standard 
group) of the addition of test 1 to test 4. 

From inspection of Table XIII it was obvi- 
ous that tests 4 and 1 were more predictive of 
the criterion than the test 4 alone. Therefore, 


no one will be able to meet it on all of the 
tests in the battery. The vgs who fail 
the lowest critical score should be small for 
the same reason. On the graph the ideal test 
would appear as a long straight line extending 
from the lower right to the upper left. After 
the individual tests have been combined into 
batteries these batteries may be evaluated in the 
same manner, from a graph such as Figure 2. 


both of these tests were included in all other 
trial batteries. To them were added in order 
of merit each of the five remaining tests (2, 7, 
6, 5, and 3). However, as none of these additions 
increased the predictive power of the battery 
without drastically reducing the percentage 
selected, none was accepted for the final battery. 
Table XIV shows the relative effectiveness of 
all trial batteries. 

Inspection of Table XIV showed that the 
battery made up of tests 4 and 1 was more 
effective than any of the other trial batteries 
obtained by the multiple cutting-score technique 
described above. However, following Ruch’s 
method, the data in this table was plotted on 
graph paper. The result is Figure 2. 

From inspection of Table XIV and Figure 2 
it was obvious that tests 4 and 1 constituted the 
single best battery. However, the battery con- 
sisting of tests 4, 1, and 5 was more effective 
at the highest selection level. (The cases which 
were selected at the A level by tests 4, 1, and 5 
had a mean criterion score of 88 while those 
selected at the A level by tests 4 and 1 had a 
mean criterion score of only 87). Therefore, test 
number 5 was added to the final battery (1 and 
4), for use at the highest critical-score level 
only. An AA critical score was added at the 
top to make this possible. The final battery 
then consisted of tests 4, 1, and 5 at the AA 
critical-score level, and tests 4 and 1 at the 


13 a 
Name A B a D E F aa 
Test M M %\| M %o| M % 
I 85.85 | 38 | 85.63 | 51 85.18 | 64 | 84.71 | 73 
2 85.46 | 43 | 85.46 | 43 84.73 | 78 | 84.73 | 78 4 
3 84.17 | 53 84.17 | 53 84.16 | 66 | 84.16 | 66 — 
4 86.05 | 33 | 86.05 | 33 | 84.89 | 63 | 84.83 | 67 — 
5 84.27 | 51 84.27 | 51 84.27 | 51 83.67 | 58 _ 
6 84.52 | 48 84.48 | 51 84.02 | 61 84.13 | 64 _ 
7 84.47 | 50 84.47 | 50 84.09 | 62 84.22 | 70 q 
: 
= 


GLEN GRIMSLEY 


BATTERY TESTS 
4+1 
4+1+2 
4+1+7 

= 4+1+6 
4+1+5 


N 
O 


Ol 


O 


Y) 


85 86 87 
Mean Criterion Score 


Ficure 2. Mean Criterion Score and Per Cent Selected by Trial Batteries. (Bat- 
teries identified by numbers, critical scores by letters.) 


14 : 
45 
+H 
Bai 
4044 
TT 
\ 
| 
= 
20) pred aot 
PORE 
i 


A MULTIPLE CUTTING-SCORE METHOD 


TABLE XIV 


MEAN CRITERION Scores (M) AND PERCENTAGES SELECTED BY THE TRIAL BATTERIES AT 


Eacu CritTIcaL-ScorE LEVEL IN STANDARD GROUP 


Composi- A B Cc D E F 

tion of 

Battery M % M % M % M % M % M % 
4+1 87.07 | 17 87.16 | 20 87.03 | 24 87.00 | 27 85.68 | 44 | 85.21 | 52 
4+1+2 85.81'| 10 85.93 | 17 86.29 | 12 86.55 | 19 85.82 | 37 85.51 | 43 
4+1+7 87.24 8 87.62 | 10 | 87.21 | 15 87.14 | 18 86.08 | 28 86.05 | 37 
4+1+6 86.48 9 86.39 | 11 86.31 | 14 86.44 | 16 85.96 | 27 85.57 | 34 
44+1+5 88.07 | 11 87.97 | 12 87.51 | 15 87.51 | 16 86.14 | 25 85.66 | 30 
4t+14+3 86.48 | 12 86.71 | 14 86.55 | 17 86.79 | I9 85.93 | 32 85.31 | 37 


A, B, C, D, E, and F levels. Table XV shows 
the critical scores for the tests making up this 
final battery. (Cf. Table XI). 

These critical scores were then applied to the 
standard group from which they had been de- 
rived. The extent to which this final battery 
of critical scores predicted the criterion scores 
for this group can be seen in Table XVI. The 
data in Table XVI serve the same purpose in 
the multiple cutting-score method as does the 
R in the multiple correlation methods. They 
indicate the degree of success that is likely to 
follow the use of the battery for predictive 
purposes in comparable samples. 

From inspection of Table XVI it is obvious 
that the battery is not highly predictive of the 
criterion. The two indicators of this fact are 
the complete lack of discrimination between the 
A and D critical scores, and the relatively small 
differences between the mean criterion scores 
at the AA, F, and FF (failure) levels. In 
standard deviation units the difference in the 
mean criterion scores of the best 11.2% and the 
poorest 48% is only 0.77. ; 

The following section deals with the appli- 
cation of these critical scores and the regression 
weights to the prediction group. 


C. APPLICATION OF THE CRITICAL SCORES AND 
REGRESSION WEIGHTS TO THE PREDICTION GROUP 


The critical scores and the regression weights 
developed from the standard group were now 
applied to the prediction group. The applica- 


TABLE XV 
CriTICAL ScorES FOR TESTS IN FINAL BATTERY 


TABLE XVI 


MEAN CRITERION Scores (M) AND PER CENT 
SELECTED BY THE FINAL BATTERY, 
IN THE STANDARD GROUP 
Mean 
Critical “sina: Per Cent 
Criterion* 

Scores Selected 
AA 88.07 11.20 
A 87.07 16.80 
B 87.16 20.40 
87.03 24.00 
D 87.00 27.20 
E 85.68 44.40 
F 85.21 52.00 
FF (all below F) 82.06 48.00 


* S.D. of criterion scores = 7.81 


tion of the multiple cutting-score battery was a 
simple process of sorting the 250 cards into 
groups according to the critical-score levels, and 
resulted in each individual receiving a predic- 
tion score of AA, A, B, C, D, E, F, or FF 
(failure). The application of the Wherry-Doo- 
little battery involved considerably more time 
and calculation, and resulted in assigning to 
each individual a composite prediction score 
which was the sum of his five individual test 
scores after each had been multiplied by the 
appropriate b weight. 


Critical Scores or Standards 


15 
a 
AA A B E F 
q 4 18 18 18 16 | 15 13 12 _ 
I 12 12 II II II 10 
5 76 ° ° ° ° ° ° i 


GLEN GRIMSLEY 


TABLE XVII 
COMPARISON OF THE GROUPS SELECTED BY THE Two METHODS 


Common 
Cases 


Cases 
Selected 


Level 


of 


Selection No. % 


Mean 
Criterion Score 


CR. 
C-S*** 


32 12.80 
47 18.80 
64 | 25.60 -19 
72 28.80 .06 
80 | 32.00 +75 
117 46.80 -07 
132 52.80 +03 


88.25 
86.96 
86.53 
86.25 
86.28 
85.50 
85.27 


-438 
-575 
-672 
-681 
-688 
.780 


-446* 
+752 
565 
-170 
-279 
-455 


1.323 
-987 
-708 
-647 
501 
-462 


* Starred critical ratio favors multiple cutting-score method, all others favor the Wherry- Doolittle 


method. 
** Wherry- Doolittle 
*** Multiple Cutting-Score 


D. COMPARISON OF THE BATTERY SE- 
LECTED BY THE MULTIPLE CUTTING- 
ScorE METHOD AND THE BATTERY 
SELECTED BY THE WHERRY- 
DooLitTTLe METHOD 


The two batteries were compared to 
each other in four ways: (1) predictive 
efficiency, (2) the amount of shrinkage 
attendant upon the application of the 
battery to the prediction group, (3) the 
number of tests required, and (4) the 
time required to select the battery and 
calculate the predicted criterion scores. 

1. Relative predictive efficiency. The 
predictive efficiencies of the two batteries 
were compared in two ways, (1) by means 
of the correlation of predicted criterion 
scores with the actual criterion scores, 
and (2) in terms of the mean criterion 
scores of the group selected. 

The coefficient of contingency (3, 
p. 338) between the predicted criterion 
scores by the Wherry-Doolittle method 
and the actual criterion scores is .455,° 
as compared with .469*° between the mul- 

*These contingency coefficients are uncor- 
rected for “broad categories” as they were 
calculated from 8 X 8 fold tables which (3, 


p. 338) gives them a possible maximum value 
of .935- 


tiple cutting-score predicted criterion 
scores (AA, A, B, C, D, E, F, and FF) and 
actual criterion scores. The contingency 
coefficient between the two sets of pre- 
dicted criterion scores is .657.* 

Another way of comparing the pre- 
dictive efficiency of the two batteries is 
to compare the mean criterion scores 
of comparable groups selected by the 
two methods. In doing this the 32 indi- 
viduals selected by the multiple cutting- 
score battery as meeting the AA stand- 
ard were compared with the 32 indi- 
viduals having the highest predicted 
scores by the Wherry-Doolittle method. 
In like manner the A, B, C, D, E, and 
F groups as selected by the multiple 
cutting-score method were compared as 
to mean criterion score with the best 
47, 64, 72, 80, 117, and 132 cases as se- 
lected by the Wherry-Doolittle method. 
Table XVII shows the critica! ratios of 
the differences between the mean crite- 
rion scores of those selected by the two 
methods. 

The first column of Table XVII gives 
the selection level, the second and third 
columns show the number and percent- 


‘Footnote 3 is applicable. 


16 
| 
| 
No. | % W-D* 
AA 87.66 -59 
A 87.03 
B 87.06 -53 4 
Cc 86.65 -40 
D 86.39 .II 
E 85.64 
F 85.48 
I 
( 
1 
] 
] 
] 
( 
3 t 
t 
t 
= 


A MULTIPLE CUTTING-SCORE METHOD 17 


TABLE XVIII 
COMPARATIVE SHRINKAGE BY WHERRY-DOOLITTLE AND MULTIPLE CuTTING-SCORE METHODS** 


Cases Selected by W-D Battery 


Cases Selected by M C-S Battery 


Level of 


Mean Criterion Score 
Selection 


Mean Criterion Score 


S-Group P-Group 


Diff. 


S-Group 


A 89.18 87.66 
87.81 87.03 
87.20 87.06 
86.98 86.65 
86.31 86.39 
85.68 85.64 
85.85 85.48 
83.68. 83.59 


Total 


—1.52* .07 
-78 .07 
-14 -16 
-33 +03 
.08 
-68 
-21 


68 


* Minus differences indicate shrinkage as loss in predictive efficiency, and plus differences indicate 
no loss through shrinkage but rather a chance gain in predictive efficiency. 


** W-D battery =Wherry-Doolittle battery 


M C-S battery = Multiple Cutting-Score battery 


S-Group =Standard group 
P-Group= Prediction group 


age selected at this level, and the next 
two columns (headed common cases) 
show the number and percentage of 
cases which were selected by both the 
multiple cutting-score and the Wherry- 
Doolittle methods. The sixth column, 
headed r, shows the correlation between 
the criterion scores of the groups selected 
by the two methods. The r was calcu- 
lated from the proportion of overlapping 
cases (13, p. 122).5 Columns 7 & 8. give 
the mean criterion scores for the groups 
selected by each method. Column g gives 
the differences between these means, 
column 10 shows the standard error of 
this difference,® and column 11 gives 
the critical ratio. 

Neither method of comparison shows 
any significant difference in the predic- 
tive efficiencies of the methods. The in- 
significant differences were in favor of 
the multiple cutting-score method at the 


Ne 


r=—______, where Nc=the number of 
Nc + Na 


common cases, and Na=the number of un- 
common cases. 


highest selection level and in favor of 
the Wherry-Doolittle method at all 
lower levels of selection. 

Comparative extent of shrinkage. The 
comparative extent of shrinkage by the 
two methods was determined by compar- 
ing the losses in predictive efficiency 
which occurred when the batteries which 
had been developed upon the standard 
group were applied to the prediction 
group. These changes are shown in terms 
of mean criterion scores in Table XVIII. 

While there are no formulas available 
for the calculation of the significance 
of the differences in shrinkage between 
the two methods, they appear to be in- 
significant. However, it is interesting to 
note that at the highest selection level 
(AA) the multiple cutting-score battery 
shows the least shrinkage and the 
Wherry-Doolittle battery the most. 

The number of tests required by each 
method. Three tests (4, 1, and 5) were 
used in the multiple cutting-score bat- 
tery, and five tests (2, 4, 1, 6, and 7) 
were used in the Wherry-Doolittle bat- 
tery. However, the Wherry-Doolittle bat- 


86.96 —.II 
86.53 — .63 a 
86.25 —.78 
86.28 
85.50 —.18 
+ .06 
yf = 
oO 
CS 5 4 
qd 
= 
4 


18 GLEN GRIMSLEY 


tery might well have been limited to 
tests 2, 4, and 1, as the addition of tests 
6 and 7 increases multiple R only .oo6. 

Comparison of the two methods as to 
time spent in developing the batteries 
and applying them to the prediction 
group. It required approximately 60 


hours to select the battery and apply 
the regression weights to the prediction 
group by the Wherry-Doolittle method, 
while the multiple cutting-score tests 
were selected and applied in approxi- 
mately 20 hours. 


ol 
m 
Se 
u 
le 
b. 
cl 
te 
re 
t 
n 
re 


CHAPTER V 


A SUGGESTED MODIFICATION OF THE MULTIPLE 
CUTTING-SCORE METHOD 


N THE process of developing the mul- 
I tiple cutting-score battery it became 
obvious that a slight addition to Ruch’s 
method of graphically representing the 
selectivity of the individual tests (Fig- 
ure 1) would facilitate the process of se- 
lecting the tests to be used in the final 
battery. This addition consisted in in- 
cluding the FF (failing) group for each 
test. The FF point shows the percentage 
rejected by a test and the mean criterion 
score of this rejected group. For each 
test this point on the graph was con- 
nected by a straight line to the point 
representing the mean criterion score 
and per cent selected by the lowest criti- 
cal score. In general, the lower the mean 
criterion score of the cases in the FF 
group, and the lower the percentage of 
cases in this group, the better the test. 
Figure 3 shows the proposed method. 
(Compare with Figure 1). 

Selection of a Multiple Cutting-Score 
battery from Figure 3. In order to check 
the value of the change in method a 
battery of tests was selected by inspec- 
tion of the new graph (Figure 3). Tests 
4, 1, and 5 were immediately chosen for 
the same reasons that they had been 
chosen from the other graph. However, 
inspection of Figure g suggested that 
test number 2 should also be tried be- 


cause of the exceptionally low percent- 
ages of subjects failing the lowest (F) 
critical score, and because these failures 
had a very low mean criterion score. On 
the basis of this reasoning, a new mul- 
tiple cutting-score battery composed of 
tests 4, 1, 5, and 2 was selected. Tests 
4, 1, and 5 were used exactly as used in 
the original study, 4 and 1 being used 
at all levels of selection while 5 was 
used at the AA level only. Test number 
2, however, was used only at the lowest 
level of selection in order to take ad- 
vantage of its predictive accuracy at the 
lower end of the scale without too dras- 
tically reducing the total number of 
cases selected. Table XTX shows the criti- 
cal scores for each of the tests making 
up the battery. 

Application of the Multiple Cutting- 
Score battery to the prediction group. 
The 250 cards of the prediction group 
were sorted into eight groups, those 
which met the AA, A, B, C, D, E, and 
F standards and those who failed the 
lowest critical score. Table XX shows 
the number of cases selected and the 
mean criterion score at each level of se- 
lection. 

Comparison of the criterion scores of 
those selected by the modified cutting- 
score method with comparable groups 


TABLE XIX 


CuttinG-Scores 


Critical Scores 


| 


4 
| 
7 4 18 18 18 16 15 13 12 7 a 
I 12 12 II It II 10 9 
5 76 ° ° ° ° ° 
7 2 9 9 9 9 | Sr 9 9 = 
19 


GLEN GRIMSLEY 


toot 


Mean Criterion: Score 


Ficure 3. Graphic Representation of the Selectivity of the Individual Tests. (Tests 
are identified by numbers, critical scores by letters. Points A, B, C, D, E, and F mark 
percentages selected and points FF mark percentages rejected.) 


20 
2” KEEL - 
5 
au 
HA 
4 
| 


A MULTIPLE CUTTING-SCORE METHOD 


TABLE XX 


MEAN CRITERION SCORE AND NUMBER OF 
Cases SELECTED AT EACH CRITICAL 
Score LEVEL 


113 


FF (failures) 137 


selected by the Wherry-Doolittle battery. 
A comparison of the mean criterion 
scores of the groups selected by the mul- 
tiple cutting-score battery of four tests 
and the Wherry-Doolittle battery of five 
tests is given in Table XXI. It will be 


21 


noted from Table XXI that again the 
differences between the criterion means 
obtained by the two methods are not 
large enough to be statistically significant 
nor practically important. However, it 
should be noted that the multiple cut- 
ting-score battery developed from the 
revised graph is equal to the Wherry- 
Doolittle battery in predictive efficiency 
at even the lowest (F) level of selection. 
Comparison of Table XXI and Table 
XVII shows that the superiority of the 
multiple cutting-score method at the 
AA level is increased from .59 to 1.40. 
This change is statistically insignificant, 
the difference of 1.40 amounting to 0.18 
in terms of the standard deviation of 
the criterion. 


TABLE XXI 


COMPARISON OF THE MEAN CRITERION SCORES OF THE CASES SELECTED BY THE 
WHERRY-DOOLITTLE AND THE MULTIPLE CUTTING-SCORE BATTERY 


Selection Cases 


Mean Criterion Scores 


Level No 


% 


W-D Method* | M C-S Method** 


30 
44 
57 
64 
71 
103 
113 
137 


F 
FF (failures) 


12. 
22. 
25. 
28. 
41. 
45- 
54- 


88.83 
87.39 
87.00 
86.81 
86.80 
85.75 
85.59 
81.93 


* Wherry-Doolittle Method. 
** Multiple Cutting-Score Method. 


Critical Mean 
Score q 
Level Score 
AA 30 88.83 
A 44 87.39 
B 57 87.00 
D 71 86.80 4 
E 103 85.76 
85.59 
81.93 
q 
87.45 7 
B 87.14 
D { 86.55 a 
a 


HE PRoBLEM. A comparative study 
was made of the Wherry-Doolittle 
(23) method and the multiple cutting- 
score method of Ruch (16). The problem 
was to discovér which method was more 
useful for the selection of a battery of 
tests to be used in predicting success in 
elementary accounting. classes. 

“The Subjects, Materials, and Proce- 
dure. The subjects were 500 students 
of elementary accounting. Seven _five- 
minute tests were administered to the 
and course- 
grade for each subject were recorded on 
a 5” X 8” card, and the cards were 
sorted into two poe of 250 each on 
the basis of cou ne of these 
groups was “Yabelled the ‘standard group 
andthe other the prediction group. 

The prediction-group cards were laid 
aside while the standard-group scores 
were used in the*selection of two bat- 
teries Of tests, one by the well known 
Wherry-Doolittle method, and the other 
by the multiple cutting-score method 
as applied by Ruch! (16). 

‘The multiple regression weights which 
had been determined on the standard 
group by the Wherry-Doolittle method 
and the multiple cutting-scores devel- 
oped on the standard group by Ruch’s 


method were both applied to the test 


scores of the prediction group. The mul- 
tiple cutting-score’ method “categorized 
the 250 subjects in the prediction group 
as follows: AA, best 32 cases; A, best 47 
cases; B, best 64 cases; C, best 72 cases; 
D, best 80 cases; E, best 117 cases; F, best 
132 cases; FF, poorest 118 cases. The best 
32, 47, 64, etc., cases were then selected 
on the basis of the predicted criterion 
scores by the Wherry-Doolittle method; 


CHAPTER VI 
SUMMARY AND DISCUSSION OF RESULTS 


and the mean criterion scores (grades) 
for each of these groups were compared 
with the mean criterion scores of the 
comparable group selected by the mul- 
tiple cutting-score method. A further 
comparison of the predictive efficiencies 
of the two batteries was made by calcu- 
lating contingency coefficients between 
the actual and the predicted criterion 
scores. 

Results. The mean criterion score of 
the best 12.8 per cent (thirty-two Cases) 
selected by the multiple cutting-score 
method was slightly higher the 
mean criterion score of the top 12.8 per 
cent selected by the Wherry-Doolittle 
method (C.R. diff. = .446). At all lower 
levels of selection (18.8 to"52.8 per cent) 
the groups selected by the Wherry-Doo- 
little method were insignificantly  su- 
perior. The critical ratios of the dif- 
ferences between the means ranged from 
.170 to .752. The correlation (C) between 
actual and the Wherry-Doolittle pre- 
dicted criterion scores was .455, and 
.469 between actual and multiple cut- 
ting-score predicted criterion scores. 

Suggested Modification of Ruch’s 
Method. In the process of determining 
the multiple cutting-scores it was dis- 
covered that a slight addition to Ruch’s 
method of graphically representing the 
selectivity of the individual tests would 
enable the user to select a more predic- 
tive multiple cutting-score battery. This 
change in procedure led to the selection 
of a multiple cutting-score battery of 
four tests- which was ‘superior to the bat- 
tery selected by the Wherry-Doolittle 
method. However, the superiority, meas- 
ured in terms of the mean critérion 
scores of the groups selected, was too 


- 


en. 


b 
i 
| 
| 22 | 


A MULTIPLE CUTTING-SCORE METHOD 


small to be significant. 
Discttission of Results. As measured by 
mean criterion-scores of comparable 
groups and by contingency coefficients 
between the actual and predicted criter- 


€ equal. With the 
this study the multiple cutting-score 
method was just as accurate as the 
Wherry-Doolittle method. Furthermore, 
the multiple cutting-score method was 
shown, in the particular application of 
the present study, to — several ad- 
vantages: (a) it réquired only one-third 
as much calculation time, (b) very little 
knowledge of statistics is required for 
its use, and~(¢) it was less affected by 
shrinkage. 


23 


the range of scores on each of the tests, 


(e) the size of the intercorrelations, (f) 


i | the size of the validity coefficients (g) the 


| reliabilities of the separate tests, (h) the 


reliability of the criterion, and (i) the 
ion scores, the predictive efficiencies of © 
the batteries developed by the two meth-- 
ata u in 


number of cases available for use in de- 
veloping the battery. 

Before any conclusions can be drawn 
as to the general superiority of either 
method these factors must be investigated 
in a variety of predictive situations. 
However, it seems that certain tentative 
conclusions can be drawn from this study 
and those previously done by Ruch (17), 
Wahoske (21), and the writer (4). 

As was stated earlier, the multiple cut- 
ting-score method seems especially suited 
_fo the needs of the personnel selection 
situation where the problem i is not that 


It is not suggested that this study of ranking applicants in order of ws 
establishes the general superiority of the “able success, but is one of selecting th 
multiple cutting-score method. Many 
similar studies should be made in order 


to determine the conditions of relative 
superiority—or inferiority. ‘Some of 
factors which are likely to be found to 
be important in determining which 
method will be more useful in any given 
situation are: (a) the degree to which the 
relationships between criterion and pre- 
dictive measures, and among the pre- 
dictive measures themselves, deviate 
from the linear assumption required by 
correlational methods, (b) the extent to 
which weakness in one ability, as indi- 
cated by a low score on one measuring 
instrument, may be compensated by a 
high score in another ability, (c) the 
number of tests used in the battery, (d) 


fo or 15 per cent of. 
most likely to be successful on the job. 


In this particular situation, there is some. 


basis for assuming that the multiple cut- 
ting-score method may be safely used, 
since in each of the above-mentioned 
studies it has been shgwn to De superior 


to the multiple correlation methods at 


high selection.(10 to 15 per cent) levels. 


In other situations, e.g., in the predic- 
tion Of academic success, where it is de- 
sirable to know each individual's prob- 

oa: 
ability of success, the multiple cutting- 


‘score approach has little advantage to 


offer, since its efficiency of discrimina- 
tio n among x the lower 5 50 per cent of cent of the 


group is not superior. ~ 


q : 
\ 
) 
4 
r 
: 
: 
r 
r 3 
1 \ 2 
| 4 
i § q 
4 
e 
4 
‘ 
aa 


1. FLANAGAN, JOHN C. A_ successive approxi- 
mation solution for prediction problems 
involving a large number of variables. 
Proceedings of the Education Research 
Forum, Endicott, Aug., 1940. (Published by 
1.B.M. Corp., 590 Madison Ave., New 
York.) 

2. FRANZEN, RAYMOND. A method for selecting 
combinations of tests and determining 
their best “cut-off points” to yield a 
dichotomy most like a categorical criterion, 
with an appendix, “Solution of the selec- 
tion of the best combinations of dicho- 
tomous arrangements to distinguish a cate- 
gorical criterion,” by Paul F. Lazarsfeld 
and Raymond Franzen. Civil Aeronautics 
Administration, Division of Research, Re- 
port N. 12, Washington D.C., March, 
1943- 

g. Garretr, Henry E. Statistics in psychology 
and education, New York: Longmans, 
Green, 1938. 493 Ppp- 

4. GRIMSLEY, GLEN. Comparison of multiple 
cutting-scores and the multiple correlation 
methods in the selection of drafting 
trainees. Unpublished study, Lockheed 
Aircraft Corp., Burbank, Calif., 1944. 

. Draftsmen: Aptitude Tests Cut Turn- 
over. Western Industry, Jan., 1944. 

6. GuiLForp, J. P. Fundamental statistics in 
psychology and education. New York: 
McGraw-Hill, 1942. 330 pp. 

. Psychometric methods, New York: 
McGraw-Hill, 1936. 552 pp. 

8. Horst, Paut. An analytical formulation of 
the multiple cutting-score technique, in 
The prediction of personal adjustment. 
New York: Social Science Research Coun- 
cil Bulletin 48, 1941. 

g. Jackson, R. W. B. Approximate multiple 
regression weights. J. exp. Educ., 1943, 11, 
221-225. 

10. JENKINS, LEROY W. A quick method for 
multiple R and partial r’s. Educ. psychol. 
Meast., 1946, 6, 273-286. 

11, JOHNSON, H. M. Multiple contingency versus 
multiple correlation, an old time-saving 
way of handling multiple contingency. 

Amer. J. Psychol., 1944, 57, 49-62. 


7. 


REFERENCES 


12. 


13. 


14. 


15. 


16. 


18. 


23. 


24. 


LovELL, CONSTANCE. The effect of special 
construction of test items on their factor 
composition. Psychol. Monogr., 1944, 56, 
No. 6. 

PETERS, CHARLES C., AND VoOoRHIs, 
WALTER R. Statistical Procedures and 
Their Mathematical Bases, New York: Mc. 
Graw-Hill, 1940. 516 pp. 

Pressey, S. L. Suggestions with regard to 
Professor Thurstone’s “Method of Critical 
Scores.” J. educ, Psychol., 1919, 10, 517-520. 

RICHARDSON, MARION W. The combination 
of measures. In The prediction of per- 
sonal adjustment, Paul Horst, Ed. New 
York: Social Science Research Council, 
Bulletin 48, 1941. 

Rucu, FLoyp L. How to use employment 
tests. Los Angeles: California Test Bureau, 
1943. 

. The comparative efficiency of the 
multiple-cutting-score method and _ the 
Wherry-Doolittle method in __ selecting 
winch operators. OSRD, 1945; Publication 
No. 15820, Washington, D.C., U. S. Dept. 
of Commerce, 1946. 

Steap, W. H. et al. Occupational counseling 
techniques, their development and appli- 
cation, New York: American Book Co., 
1940. 260 pp. 


. THurstone, L. L. Mental Tests for College 


Entrance. J. 
129-142. 


educ. Psychol., 1919, 10, 


. Toors, H. A. The L-Method. Psychometrica. 


1941, 6, 249-266. 


. WAHOSKE, JEAN Mater. A study comparing 


the Wherry-Doolittle multiple correlation 
method and the multiple-cut-off method 
for selecting batteries of tests, Master's 
Thesis, Univ. of Southern, Calif., 1945. 

Wa ker, H. M. Studies in the history of 
statistical method. Baltimore: Williams and 
Wilkins, 1929. 

Wuerry, R. J. An extension of the Doolittle 
method to simple regression problems. J. 
educ, Psychol., 1941, 32, 459-464. 


. Two methods of estimating beta 
weights. J. educ. Psychol., 1938, 29, 
701-709. 


Vi 
N 
] 
= 
17. 
= 

22. 
= 
| 24 § 


